Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
1.
Sci Rep ; 12(1): 15704, 2022 09 20.
Article in English | MEDLINE | ID: covidwho-2036891

ABSTRACT

Natural language processing (NLP) algorithms process linguistic data in order to discover the associated word semantics and develop models that can describe or even predict the latent meanings of the data. The applications of NLP become multi-fold while dealing with dynamic or temporally evolving datasets (e.g., historical literature). Biological datasets of genome-sequences are interesting since they are sequential as well as dynamic. Here we describe how SARS-CoV-2 genomes and mutations thereof can be processed using fundamental algorithms in NLP to reveal the characteristics and evolution of the virus. We demonstrate applicability of NLP in not only probing the temporal mutational signatures through dynamic topic modelling, but also in tracing the mutation-associations through tracing of semantic drift in genomic mutation records. Our approach also yields promising results in unfolding the mutational relevance to patient health status, thereby identifying putative signatures linked to known/highly speculated mutations of concern.


Subject(s)
Genome, Viral , SARS-CoV-2 , COVID-19/virology , Humans , Mutation , SARS-CoV-2/genetics , Semantics
2.
J Mol Biol ; 434(15): 167684, 2022 08 15.
Article in English | MEDLINE | ID: covidwho-1885929

ABSTRACT

MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). RESULTS: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a 'temporal-modeling approach' to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis.


Subject(s)
COVID-19 , Machine Learning , SARS-CoV-2 , Severity of Illness Index , COVID-19/virology , Genome, Viral/genetics , Genotype , Humans , Mutation , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity
3.
Virus Res ; 305: 198579, 2021 11.
Article in English | MEDLINE | ID: covidwho-1433887

ABSTRACT

The SARS-CoV2 mediated Covid-19 pandemic has impacted humankind at an unprecedented scale. While substantial research efforts have focused towards understanding the mechanisms of viral infection and developing vaccines/ therapeutics, factors affecting the susceptibility to SARS-CoV2 infection and manifestation of Covid-19 remain less explored. Given that the Human Leukocyte Antigen (HLA) system is known to vary among ethnic populations, it is likely to affect the recognition of the virus, and in turn, the susceptibility to Covid-19. To understand this, we used bioinformatic tools to probe all SARS-CoV2 peptides which could elicit T-cell response in humans. We also tried to answer the intriguing question of whether these potential epitopes were equally immunogenic across ethnicities, by studying the distribution of HLA alleles among different populations and their share of cognate epitopes. Results indicate that the immune recognition potential of SARS-CoV2 epitopes tend to vary between different ethnic groups. While the South Asians are likely to recognize higher number of CD8-specific epitopes, Europeans are likely to identify higher number of CD4-specific epitopes. We also hypothesize and provide clues that the newer mutations in SARS-CoV2 are unlikely to alter the T-cell mediated immunogenic responses among the studied ethnic populations. The work presented herein is expected to bolster our understanding of the pandemic, by providing insights into differential immunological response of ethnic populations to the virus as well as by gaging the possible effects of mutations in SARS-CoV2 on efficacy of potential epitope-based vaccines through evaluating ∼40,000 viral genomes.


Subject(s)
COVID-19/immunology , Epitopes, B-Lymphocyte/immunology , Epitopes, T-Lymphocyte/immunology , Ethnicity , Genome, Viral , HLA Antigens/immunology , SARS-CoV-2/immunology , Africa/epidemiology , Alleles , Amino Acid Sequence , Asia/epidemiology , CD4-Positive T-Lymphocytes/immunology , CD4-Positive T-Lymphocytes/virology , CD8-Positive T-Lymphocytes/immunology , CD8-Positive T-Lymphocytes/virology , COVID-19/epidemiology , COVID-19/genetics , COVID-19/pathology , Computational Biology/methods , Disease Susceptibility , Epitopes, B-Lymphocyte/classification , Epitopes, B-Lymphocyte/genetics , Epitopes, T-Lymphocyte/classification , Epitopes, T-Lymphocyte/genetics , Europe/epidemiology , HLA Antigens/classification , HLA Antigens/genetics , Humans , Middle East/epidemiology , Oceania/epidemiology , Principal Component Analysis , RNA, Viral/genetics , RNA, Viral/immunology , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity
SELECTION OF CITATIONS
SEARCH DETAIL